Automatic Identification of Word Translations from Unrelated English and German Corpora

نویسنده

  • Reinhard Rapp
چکیده

Algorithms for the alignment of words in translated texts are well established. However, only recently new approaches have been proposed to identify word translations from non-parallel or even unrelated texts. This task is more difficult, because most statistical clues useful in the processing of parallel texts cannot be applied to non-parallel texts. Whereas for parallel texts in some studies up to 99% of the word alignments have been shown to be correct, the accuracy for non-parallel texts has been around 30% up to now. The current study, which is based on the assumption that there is a correlation between the patterns of word co-occurrences in corpora of different languages, makes a significant improvement to about 72% of word translations identified correctly.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast automatic translation and morphological decomposition in Chinese-English bilinguals.

In this study, we investigated automatic translation from English to Chinese and subsequent morphological decomposition of translated Chinese compounds. In two lexical decision tasks, Chinese-English bilinguals responded to English target words that were preceded by masked unrelated primes presented for 59 ms. Unbeknownst to participants, the Chinese translations of the words in each critical p...

متن کامل

Identifying Word Translations in Non-Parallel Texts

Common algorithms for sentence and word-alignment allow the automatic identification of word translations from parallel texts. This study suggests that the identification of word translations should also be possible with non-parallel and even unrelated texts. The method proposed is based on the assumption that there is a correlation between the patterns of word cooccurrences in texts of differe...

متن کامل

Using Parallel Corpora to enrich Multilingual Lexical Resources

This paper describes the use of a bilingual vector model for the automatic discovery of German translations of English terms. The model is built by analysing co-occurence patterns in a parallel corpus of English and German medical abstracts, a method also used for CrossLingual Information Retrieval. The model generates candidate German translations of English words using the cosine similarity m...

متن کامل

Bootstrapping Parallel Corpora

We present two methods for the automatic creation of parallel corpora. Whereas previous work into the automatic construction of parallel corpora has focused on harvesting them from the web, we examine the use of existing parallel corpora to bootstrap data for new language pairs. First, we extend existing parallel corpora using co-training, wherein machine translations are selectively added to t...

متن کامل

Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm

Selecting the right word translation among several op tions in the lexicon is a core problem for machine trans lation We present a novel approach to this problem that can be trained using only unrelated monolingual corpora and a lexicon By estimating word translation probabilities using the EM algorithm we extend upon target language modeling We construct a word trans lation model for German an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999